Practical issues in quantum-key-distribution post-processing 
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Quantum key distribution (QKD) is a secure key generation method between two distant parties 
by wisely exploiting properties of quantum mechanics. In QKD, experimental measurement out- 
comes on quantum states are transformed by the two parties to a secret key. This transformation 
is composed of many logical steps (as guided by security proofs), which together will ultimately 
determine the length of the final secret key and its security. We detail the procedure for performing 
such classical post-processing taking into account practical concerns (including the finite-size effect 
and authentication and encryption for classical communications). This procedure is directly appli- 
cable to realistic QKD experiments, and thus serves as a recipe that specifies what post-processing 
operations are needed and what the security level is for certain lengths of the keys. Our result is 
applicable to the BB84 protocol with a single or entangled photon source. 

PACS numbers: 03.67.Dd, 03.67.Hk 



I. INTRODUCTION 

In theory, a few quantum key distribution (QKD) pro- 
tocols, such as BB84 Q, BBM92 0, B92 pf and six- 
state [4, 5], have been proven to be unconditionally se- 
cure in the last decade @, 0, B B [M [H El- Se- 
curity of other protocols, such as the Ekcrt91 proto- 
col [TU and the device- independent QKD protocol [lij ]. 
have also been studied. For a review of QKD, one can 
refer to 03. 

QKD schemes can be classified into two types: prepare- 
and-measure scheme and entanglement-based scheme. In 
the former, one party, Alice, prepares the quantum sig- 
nals (say, using a laser source) according to her basis and 
bit values and sends them through a quantum channel to 
the other party, Bob, who measures them upon reception. 
In the latter type, an entanglement source emits pairs 
of entangled signals, which are then measured in certain 
bases chosen by Alice and Bob separately. There is an im- 
portant difference in terms of security between the emit- 
ted signals with practical sources in the two cases. In 
the prepare-and-measure case, the signals emitted by Al- 
ice (say, a weak coherent-state source) is basis-dependent, 
meaning that the coherent-state signal corresponding to 
one basis is quantum mechanically different from that 
of the other basis. An eavesdropper, Eve, can certainly 
leverage this information to her advanta ge. New tech- 
niques such as the decoy-state method [l 
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| , strong-reference-pulse QKD 
and DPS [33, l3ll [32l . [33| have recently been invented 
to allow the use of coherent-state sources securely. On 
the other hand, entanglement-based QKD involves sig- 
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nals that are basis- independent. The security of basis- 
independent QKD (including entanglement-based QKD 
with a PDC source and prepare-and-measure QKD with 
a single-photon source) has been proven in Ref. [34j 
and the performance of entanglement-based QKD with 
a PDC source has been analyzed recently [3|| . 

As security analysis of QKD has become mature, it 
now comes to the stage to consider all the underlying as- 
sumptions and to apply the theoretical results to practi- 
cal QKD experiments. Although standard security proofs 
(such as Ref. @) imply a procedure for distilling a final 
secret key from measurement outcomes, such a procedure 
cannot be directly carried out in an actual QKD exper- 
iment because many of the security proofs focus on the 
case that the key is arbitrarily long. Although, in the- 
ory, there is not a fundamental limit on the length, it 
is constrained by the computational power in practice. 
Therefore, it is imperative to quantify the finite-size ef- 
fect and to provide a precise post-processing recipe that 
one can follow for distilling final secret keys with quan- 
tified security in real QKD experiments. This is the mo- 
tivation of this paper. We remark that it is not that 
the security proofs are incorrect, but carrying them out 
in practice requires more additional consideration on the 
relation between the actual steps taken and the final se- 
curity parameter. 

Ultimately, QKD system designers would like to know 
the classical computation and communications needed to 
transform the measurement results of a QKD experiment 
to a final key. Furthermore, it is important to know the 
trade-off between the final key length and the security 
parameter, since this allows one to estimate the number 
of initial quantum signals to be sent in order to achieve 
a certain final key length and security. We provide the 
solution to this in the current paper. 

It is important to note that the post-processing pro- 
cedure contains many elements including authentication, 
error correction and verification, phase error rate estima- 
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tion, and privacy amplification. Integrating all these ele- 
ments with a security proof is nontrivial and the resultant 
post-processing procedure is the main contribution of our 
paper. Our paper uses the latest security proof tech- 
niques to perform post-processing analysis, along a simi- 
lar line to an early work by Lutkenhaus [36| . We empha- 
size that the main focus of our work is the overall proce- 
dure for post-processing, rather than analyzing a security 
proof in the finite- key situation. We note that, recently, 
lots of efforts have been spent on the finite- key effect in 
QKD post-processing, such as Refs. [37t [38L [39L 1401] . 

The finite-key-length analysis is important not only 
from a theoretical point view, but also for experiments. 
For example, the efficient BB84 [4l| is proposed to in- 
crease the key generation rate. In order to select an opti- 
mal bias between the two bases, X and Z, Alice and Bob 
need to consider statistical fluctuations. We will address 
this issue in this paper. We remark that the proposed 
post-processing scheme ties up a few existing results with 
some modifications. Key features of our work are as fol- 
lows: 

1. a strict bound for the phase error estimation is de- 
rived; 

2. an authentication scheme is applied for the error 
verification; 

3. the efficiency of the privacy amplification is inves- 
tigated; 

4. parameter optimization is studied. 

This paper is an expansion of our shorter paper [42] which 
summarizes the essential components of our data post- 
processing procedure. All the technical details related to 
the procedure are presented here. 

The paper is organized as follows. Sec. |TT] introduces 
the assumptions used in the paper. Sec. IIIII discusses 
the security aspect of our procedure. Sec. [IV] outlines 
the post-processing procedures. Sec. IVl introduces some 
preliminary tools to be used in later sections. Sees. IVHlXl 
discuss the details of the various post-processing steps. 
In Sec. IXI1 we investigate the parameter optimization 
problem for this post-processing procedure. In Sec. IXII1 
we simulate an experiment setup as an example. Wc 
conclude in Sec. IXIII1 

II. ASSUMPTIONS 

Here, we examine the underlying assumptions used in 
the post-processing scheme we propose in this paper. We 
emphasize that in order to apply the scheme to a QKD 
system, one needs to compare these assumptions with the 
real setup. The assumptions used in the paper are listed 
as follows: 

1. Alice and Bob perform the BB84 protocol with a 
perfect single photon source (or basis-independent 
photon source [Hj]); 



2. The detection system is compatible with the 
squashing model [H, [H| (see also Ref. [45|), For 
example, the efficiency mismatch is not considered 
in this paper; 

3. Alice and Bob use perfect random number gener- 
ators and perfect key management. They share a 
certain amount of secure key prior to running their 
QKD system. 

III. SECURITY ASPECT 

Our data post-processing procedure is derived from 
entanglement distillation protocol (EDP)-based security 
proofs @, i, HI (also see [47j]) and thus our procedure 
is secure against the most general attacks allowed by 
the laws of quantum mechanics. The original idea [8j 
casts QKD as distilling Einstein-Podolsky-Rosen (EPR) 
pairs between Alice and Bob, involving correcting general 
quantum errors. And the ability to correct general quan- 
tum errors is equivalent to the ability to correct bit and 
phase errors [H, H^] . Later, Shor and Preskill Q show 
that correcting bit errors and phase errors in the EDP 
picture correspond to bit error correction and privacy 
amplification in distilling a secret key. Thus, proving 
the security of QKD can be cast as showing that both 
bit and phase errors are corrected in the EDP picture. 
In this way, provided that a quantum error correction 
code exists for the specific bit and phase error rates in 
the EDP picture, the security of the corresponding QKD 
protocol is proved. However, this places a rather strong 
requirement on the quantum error correction code since 
constraints on both bit and phase error rates have to be 
satisfied. Fortunately, Lo [46[ further shows that bit and 
phase errors can be decoupled by simply encrypting the 
bit error syndrome transmission (without affecting the 
net key generation rate). Koashi [47} adopts the same 
decoupling mechanism and further generalizes the notion 
of phase errors with a simple and yet powerful argument. 
In this paper, we follow this line of security proofs in our 
finite-key analysis. Essentially, the important ingredients 
in our analysis are 

• encrypting the bit error syndromes; 

• using a random sampling argument to place bounds 
on the phase error rate; 

• using a privacy amplification scheme (with struc- 
ture) and placing bounds on its phase error cor- 
recting capacity; 

• integrating authentication in classical communica- 
tions. 



A. Composable security 

The finite key analysis is closely related to the def- 
inition of security. Currently, the composable security 
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definition of QKD [H| is widely accepted as the most 
stringent security definition in the field. QKD is com- 
posable in the sense that the final key generated is indis- 
tinguishable from an ideal secret key except with a small 
probability, and thus the key can be used in a subse- 
quent cryptographic task where an ideal key is expected. 
A secret key is considered ideal if it is identical between 
Alice and Bob and is private to Eve (i.e., Eve has no in- 
formation on it). The notion of composability was first 
proposed in the classical setting for the study of secu- 
rity when composing classical cryptographic protocols in 
a complex manner [52l [53[ . Composability has also been 
carried over to the quantum setting [5(| H3] . One essen- 
tial feature of the composable security definition is that it 
characterizes the security of a protocol with respect to the 
ideal functionality. In particular, the security of a com- 
posable secret key generated by QKD is measured with 
the trace distance between the real situation with the real 
key and the ideal situation with an ideal key [12|, |5l|, [55| . 

Definition 1 ( [H, |5l|, HE| ) • A random variable V (the 
classical key) drawn from the set V is said to be ^-secure 
with respect to an eavesdropper holding a quantum sys- 
tem E if 

^Tr\p VE - Pu ®p E \ <C (1) 

where p V E = J2veV p v(v)\v)(v\ <g> p E \v=v, Pu = 
J2vev \ v ) (^l/l^l represents an ideal key taking values uni- 
formly over V, and |V| is the size of V. Here, Tr \A\ — 
|Aj| where are the eigenvalues of A. 

Since the trace distance ^ Tr \p — a\ is the maximum 
probability of distinguishing between the two quantum 
states p and a, this security definition naturally gives 
rise to the operational meaning that the £ _ secure key 
V is identical to an ideal key U except with proba- 
bility £. This trace-distance security parameter is ad- 
ditive when practical cryptographic protocols are com- 
posed [50]. That is to say, suppose we have a key gener- 
ation protocol (e.g., QKD) and a second cryptographic 
protocol which consumes an ideal secret key. Further- 
more, we suppose that the first protocol realizes an ideal 
key generation scheme with a security parameter £i for a 
particular secret key output, and also the second protocol 
realizes its ideal functionality with a security parameter 
^2- Then, when the two protocols are composed (i.e., the 
imperfect key generated in the first protocol is used in 
the second protocol), the overall security parameter will 
become £i + C2 ■ 

Our paper is based on EDP-based proofs which often 
justify security with the fidelity between Alice and Bob's 
state and the ideal state (the perfect EPR pairs). This 
fidelity is a direct consequence of the failure probability 
of the post-processing procedure. Thus, we need to find 
a connection of this failure probability with the compos- 
ability definition in Definition [TJ 

The generation of the final key in one round of QKD 
is composed of many steps (cf. Sec. IIVI and Table [TJ 



and each step carries a certain failure probability. This 
probability represents the case that Alice and Bob believe 
the step has succeeded but actually not (in other word, a 
case of undetected failure) . A detected failure in any step 
will lead a premature termination of the QKD process. 
Successes of all steps will result in a perfect final key that 
is private to Eve and identical between Alice and Bob. 
However, since each step may fail without being detected, 
there is a certain probability that the final key fails to be 
perfect and this probability is upper bounded by the sum 
of the failure probabilities of all the steps. Essentially, 
this sum is the failure probability e of the entire post- 
processing procedure, which needs to be converted to the 
security parameter of the final key. 

In the context of Koashi's proof [47[, success of the 
phase error correcting step as part of the post-processing 
procedure guarantees that Alice's m-qubit state pa can 
be corrected to the pure state O^™ 1 . The final key is 
then generated by m measurements in the Z-basis on 
Pa- Since the entire post-processing procedure fails with 
probability at most e, the component of Alice's state 
PA corresponding to the pure state 0® m must satisfy 
(0| m |pA|0| m ) > 1 - e. In order to make the connec- 
tion with the universal composability definition in Defi- 
nition [JJ it has been suggested [55[ (also see [E(|) that a 
bound on the trace distance is obtained from the fidelity 
using a general inequality relating them [56j : 

\Tr\p-o\< y/l-F{p,a)\ (2) 

where F(p, a) = Tr y p^l^ap 1 ! 2 is the fidelity between 
p and a. Thus, we seek the minimum of the fi- 
delity between p A E and |0f m ) A (0fi m | <g) p E in order to 
get an upper bound on their trace distance, in accor- 
dance with Definition [TJ Since the fidelity never de- 
creases under a trace-preserving quantum operation (i.e., 
F{£{p) 1 £{a)) > F(p, a)), system E can be considered to 
be the entire purification of system A when the minimum 
occurs. Assuming this worst case, the joint state is of the 
form 

\* AE )=n Q T) A \o)E+vr^\* ± ) AE (3) 

where | v I /± ) j4£; has unit norm, Ai®®™]^} ae = 0, and 
a > 1 — s. The fidelity between the real situation and 
the ideal situation is (see Appendix lAl for proof) 

F(p ae ,\0T)a(°T\®Pe) >« (4) 

> 1 - £ (5) 

where p A E = \^ae)(^ae\ and p E = Tr a(pae)- Thus, 
an upper bound on the failure probability provided by 
the EDP-based proofs can easily be translated to a com- 
posable security measure. By substituting Eq. ([5]) for the 
fidelity in Eq. @ and using the fact that projection onto 
the eigenstates of the Z-basis corresponding to the final 
measurement does not increase the trace distance [79j . 
we conclude that the final key is y/e(2 — e)-secure in ac- 
cordance with Definition [1] 
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Lemma 1. When the failure probability of the post - 
processing procedure is e, the final key is y/e(2 — e)- 
secure in accordance with Definition [TJ 

We can apply this lemma when many rounds of QKD 
are composed. Suppose the post-processing of each round 
fails with a probability e, and Alice and Bob plan to use 
a QKD system a million times in the manner that the se- 
cret key output of one round is fed as input to the next. 
Since the trace-distance measure is additive when cryp- 
tographic protocols are composed [5(| , the trace-distance 
security parameter for the key in the last round will 
be 10 6 y/e(2 — e). Note that this security parameter in- 
creases linearly with the number of rounds of the QKD 
system. This linear dependence is an important feature 
of the composability security definition. 

We note that Mayers' security proof d, 0] has also 
implicitly mentioned using failure probability to quantify 
the security. 

B. Equivalence of the failure probability and the 
trace distance as the optimization objective 

The failure probability of the post-processing proce- 
dure e is related to the trace-distance security parameter 
C by C = y/e{2-£). Since % > and < 0, we 
have C^ 1 ) > Q 2 ' & £^ > an d there is a one-to-one 
mapping between these two measures. Thus, minimizing 
either e or ( subject to the same constraints (such as a 
fixed key length) will produce the same solution. 

C. Simple lower bound on failure probability 

It is easy to lower bound the failure probability as 
a function of the secret-key cost /ci n iti a i by considering 
that Eve has a 2 -feinitial chance of guessing the right ini- 
tial secret key and thus will be able to launch a man- 
in-the-middle attack successfully. Therefore, the failure 
probability of any post-processing scheme should be at 
least 2 _fcinitial . Moreover, the failure probability of our 
scheme exhibits the same exponential decrease as the 
lower bound (see the various constituent failure proba- 
bilities e's listed in Table [!}. 



IV. OUTLINE OF POST-PROCESSING 
PROCEDURE 

The post-processing procedure is listed as follows. We 
remark that each communication between Alice and Bob 
consists of a message and an authentication tag, each 
of which may be of zero length. In our scheme, a tag 
is transmitted if and only if authentication is used and 
in this case the authentication tag is always encrypted 
by a one-time pad, consuming some pre-shared secret 
bits. When a message is transmitted, it may or may 



not be encrypted, and it is assumed to be unencrypted 
unless otherwise stated. Note that no message but a 
tag is transmitted in the error verification step (step 4). 
Fig. [1] shows the flow chart of our data post-processing 
procedure. 

1. Key sift [not authenticated]: Alice sends N quan- 
tum signals to Bob, of which n signals produce 
clicks. Bob discards all no-click events and obtains 
n-bit raw key by randomly assigning values to the 
double-click events [80| . Note that other key sift 
procedures might be applied as well (see, for exam- 
ple, Ref. [13). 

2. Basis sift [authenticated]: Alice and Bob send each 
other n-bit basis information. In the end of this 
step, Alice and Bob obtain n x (n z )-bit sifted key 
in the A(Z)-basis. Define the bias ratio to be q x = 
n x /(n x + n z ). Note that this bias ratio is different 
from the probability the Alice and Bob choose the 
two bases. Define the probability that Alice and 
Bob choose the A-basis to be p x , then in the long 
key limit, q x = p 2 x /[p 2 x + (1 ~ Px) 2 ]- 

3. Error correction [not authenticated but encrypted, 
Section fVIIj : Alice and Bob perform error correc- 
tion so that Bob's raw key matches Alice's. The 
classical messages exchanged in this process are en- 
crypted. If error correction fails, Alice and Bob 
abort the QKD process. 

4. Error verification [Section IVIIIj : Alice and Bob 
want to make sure (with a high probability) that 
their keys after the error correction step are identi- 
cal. If error verification fails, Alice and Bob either 
go back to the error correction step or abort the 
QKD process. We note that the idea of using error 
verification to replace error testing is proposed by 
Liitkenhaus [36| . 

5. Phase error rate estimation [no communication, 
Section HXj : Alice and Bob use the bit error rate 
measured in the A(2')-basis to infer the phase error 
rate in the iT(X)-basis. The uncertainty in bound- 
ing the phase error rates are quantified by a random 
sampling argument. 

6. Privacy amplification [authenticated, Section IX] : 
Alice randomly generates an (n x + n z + I — l)-bit 
random bit string and sends to Bob through an 
authenticated channel. Alice and Bob use this ran- 
dom bit string to generate a Toeplitz matrix. The 
final key (with a size of I) will be the product of 
this matrix (with a size of (n x + n z ) x I) and the 
key string (with a size of n x + n z ) . 

7. The final secure key length (net growth) is given by 

NR. > I kits k ec k ev kp a (6) 
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FIG. 1: Flow chart of our data post-processing procedure [5S 



with a failure probability of 



(7) 



where I is given by Eq. (|32|) . Here, the k's are the 
secret-key costs and the e's are the failure proba- 
bilities for steps 2-6 (see Table . Throughout the 
paper, e's with various footnotes stand for various 
failure probabilities. 



V. PRELIMINARY 

A. Data representation 

Data are represented as matrices or column vectors of 
O's and l's. Additions are carried out in modulo 2. For 
example, a raw key x is multiplied by a privacy ampli- 
fication matrix M to generate the final y = Mx, where 
the ith bit of y is J2j M{i,j)x(j) mod 2. 



B. Toeplitz matrices 

In our framework, we rely heavily on a particular class 
of hash functions to perform phase error correction, error 
verification, and authentication. We are interested in 
using sets of Toeplitz matrices to perform these tasks. 



Toeplitz matrices are Boolean matrices with a special 
structure: 



/ a a_i a_ 2 • ■ • a_ TO+ i\ 
a2 ax : 



M 



\ai- 



(8) 



) 



where ai 



M, 



(id) 



0,1. It can also be concisely described by 
_j where Muf\ is the element of M. 

The advantage of using Toeplitz matrices is that it can 
be specified by a small number of parameters, namely, 
rn+Z— 1 bits, as opposed to ml bits for completely random 
matrices. Hashing of a given column vector x (whose 
elements are or 1) can be performed by choosing a 
Toeplitz matrix M randomly and computing the hash 
value as Mx. 

In our post-processing scheme, we use Toeplitz matri- 
ces for three purposes: privacy amplification, error veri- 
fication, and authentication. We remark that fully ran- 
dom Toeplitz matrices (specified by m + I — 1 random 
bits) are used for privacy amplification, while for error 
verification and authentication, Toeplitz matrices speci- 
fied by a smaller number (2/) of random bits are used in 
order to save secret bits (see Sec. IV CI below). 



C. Authentication 

Alice and Bob can authenticate their classical commu- 
nications with a family of Toeplitz matrices. For every 
(classical) message they need to authenticate, both par- 
ties select a hash function from a fixed family using pre- 
shared secret bits. The sending party computes the hash 
value for the message (called the tag) and sends both to 
the other party, who also computes the hash value for 
the received message and can conclude that the message 
originates from the legitimate party if both hash values 
are identical. 

In authentication, the key component is the construc- 
tion of hashing function. Wegman and Carter proposed 
unconditional secure authentication schemes [59 . IrJoj 
by introducing the universal hashing function families, 
which are also used for privacy amplification. After- 
wards, lots of efforts have been spent on how to construct 
a universal hashing function family effectively. In this pa- 
per, we use the LFSR-based Toeplitz matrix construction 
by Krawczyk [6lL |62| for authentication. 

Here we state the result of the LFSR-based Toeplitz 
matrix construction, which is given by Theorem 9 of 
Ref. [6l| by Krawczyk. The authentication scheme based 
on the LFSR-based Toeplitz matrix construction is se- 
cure with a failure probability of 



= n2- k+1 



(9) 
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Step 


message length 


message encrypted? 


tag length 


failure probability 


1. Key sift 


TV 


- 


- 


- 


2. Basis sift 


2n 


No 




2e bs [Eq. Q3)] 


3. Bit error correction 


k ec [Eq. (12)] 


Yes 






4. Error verification 








e eB [Eq. JH] 


5. Phase error estimation 








e P h [Eq. (HTJ] 


6. Privacy amplification 


(n x + n z + I - 1) 


No 


kpa 


e va [Eq. (ED] 



TABLE I: List of resource cost and the failure probabilities in the various steps. Lengths of pre-shared secret key bits are 
designated with k while the failure probabilities with e. The relevant equations involving these quantities are also shown. 



where k is the length of the tag, n is the length of the 
message. The authentication scheme can be stated as fol- 
lows. Alice and Bob use a 2fc-bit secure key to construct 
a Toeplitz matrix with a size of (k x n) by a LFSR. The 
authenticated tag is generated by multiplying the matrix 
and the message. Then they encrypt the tag by another 
fc-bit secure key. Since the tag is encrypted by a one-time 
pad, the 2/c-bit key used for the Toeplitz matrix construc- 
tion is still secure [6l[ ■ Hence, the net secure- key cost for 
this scheme is k. 

We remark that in Krawczyk's later result [62j], the 
secure key required for the LFSR-based Toeplitz matrix 
construction can be reduced to an arbitrary number r, 
with sacrifice of failure probability, 



1 

2 1 " 



k + n-1 
2^/2 



(10) 



One can see that by choosing r = 2k, Eq. ([9]) gives a 
slightly tighter bound than Eq. (fTOf for the failure prob- 
ability. Since the secure-key cost is at least k in this con- 
struction due to one-time pad encryption, for simplicity, 
we use Eq. ([9]) for authentication and error verification. 

We remark that, as pointed out in Ref. [6l|, the LFSR- 
based Toeplitz matrix construction is highly practical in 
real-life implementation. 



VI. BASIS SIFT 

Alice and Bob send each other n-bit basis information. 
Due to the symmetry, we can assume the same failure 
probability for the two message exchanges [6l| : 



e bs = n2- kb ° +1 



(11) 



Here, Alice and Bob use a 2fc;, s -bit secure key to construct 
a Toeplitz matrix with a size of (kb s x n) by a LFSR. The 
authenticated tag is generated by multiplying the matrix 
and the message. Then they encrypt the two tags by 
two fcf, s -bit secure keys. The total secure-key cost in this 
step is 2kbs (for the one-time-pad encryption of the tags) 
and the corresponding failure probability is 2sb s - Note 
that when Alice and Bob use a biased basis choice [4l| . 
they can exchange less than n-bit classical information 
for basis sift by data compression. Since the secure-key 
cost only logarithmically depends on the length of the 



message, we simply use n for the following discussion. In 
the end of this step, Alice and Bob obtain n x (n z )-bit 
sifted key in the X (Z) basis. Define the bias ratio to be 
q x = n x j (n x + n z ) as in Sec. IIV1 



VII. ERROR CORRECTION 

For simplicity of discussion, we assume that Bob tries 
to correct his raw key to match Alice's. This means that 
we assume no advantage distillation [63L l64j . Error cor- 
rection is done by Alice sending parity information of her 
raw key to Bob encrypted with secret bits from the key 
pool. The secure-key cost is given by 

k ec = n x f{ebx)H(e bx ) + n z f(e bz )H(e bz ) (12) 

where f(x) is the error correction efficiency, and 

H{x) = -x\og 2 {x) -(1-x) log 2 (l - x) (13) 

is the binary entropy function. In practice, Alice and 
Bob only need to count the amount of classical commu- 
nication used in the error correction. That is, the value 
of k ec can be directly obtained from the post-processing. 
After the error correction, Alice and Bob count the num- 
ber of errors in the X (Z) basis: eb x n x (eb z n z ). Note 
that although we assume encryption of the parity infor- 
mation here (cf. Sec. IIII[) . it may be avoided by basing 
the post-processing on other security proofs. In this case, 
there may be some restriction on the error correction pro- 
cedure and more privacy amplification may be required. 
In practice, there is an advantage to using error correc- 
tion without encryption, since if Alice and Bob abort the 
QKD procedure, no secret bits will be lost due to encryp- 
tion. 

There is no failure probability associated with error 
correction in our post-processing scheme. Identity be- 
tween Alice's and Bob's sifted keys is verified with an 
error verification step f Sec. IVlfll below) . 

VIII. ERROR VERIFICATION 

Suppose Alice and Bob each holds a bit string a and 
b. They can verify the identity of their strings by ex- 
changing shorter strings which are the hash values /(a) 
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and /(b). Identity of the two hash values provides confi- 
dence that the two strings are the same. Below we argue 
that error verification is the same as authentication, and 
thus we use the same procedure for both purposes. This 
procedure and the associated properties is described in 
the authentication section (Sec. IV C|) . 



A. Relation to authentication 

In QKD post-processing, authenticated classical com- 
munication is required to overcome the man-in-the- 
middle attack. The objective of the authentication pro- 
cedure can be stated as follows. Alice sends Bob a mes- 
sage through a (classical) channel, which is accessible to 
Eve. Alice uses some authentication scheme to make sure 
(with a high probability) that the message is not mod- 
ified during the transmission. This classical problem is 
well studied in the literature [H, [ML Is3 • O ne tra- 
ditional solution is for Alice to add a redundant code 
(tag) to the message to be sent. The tag- message pair 
is designed in such a way that whenever the message is 
modified, Bob can detect it (with a high probability). 

Error verification, on the other hand, is the procedure 
for ensuring (with a high probability) that the bit strings 
(or keys) owned by Alice and Bob are identical. One nat- 
ural way to do this is by random hashing. For example, 
Alice randomly hashes her bit string and sends the hash 
value to Bob. Bob uses the same hash function to obtain 
his hash value and compares to the one sent by Alice. 
The probability that Alice and Bob possess different bit 
strings (keys) decrease exponentially with the number of 
rounds of hashing. 

By comparing the two procedures, authentication and 
error verification, one can see their commonality. In order 
to show the link between the two procedures, we break 
down the authentication procedure into two parts: Alice 
sends Bob the message first and then the tag. Let us take 
a look at the stage where Bob just receives the message 
sent by Alice (but before the tag). Now, Alice and Bob 
each has a bit string. In authentication, Alice sends a tag 
(corresponding to her message) to Bob and Bob verifies 
it. The claim of a secure authentication scheme is that if 
the tag passes Bob's test, the probability that Alice and 
Bob share the same string is high. This is exactly what 
is asked in the error verification procedure. Therefore, 
secure authentication schemes can be used for the error 
verification. 

We remark that the only difference between the two 
procedures is that authentication does not care whether 
the tag reveals information about the message or not, 
but error verification does (at least for our use in QKD 
post-processing). This difference can be easily overcome 
by encrypting the tag, which has already been done in 
some authentication schemes including the one we use in 
this paper. 

Thus, in this procedure, Alice sends an encrypted tag 
of an authentication scheme to Bob. The failure proba- 



bility for this step, k ev , similar to Eq. (fTTj) , is 

e ev = {n x +n z )2- h ™ +1 . (14) 

IX. PHASE ERROR RATE ESTIMATION 

In the BB84 protocol, Alice and Bob measure the bit 
error rate in the A"-basis, eb x , to estimate the phase error 
rate in the Z-basis, e pz , and vice versa. In the infinite 
length limit, the error rates, eb x and e pz , converge to the 
underlying probabilities, Pb x and p pz . Due to the sym- 
metry of BB84, we know that pb x — P pz , from which 
follows ebx — e pz in the asymptotic case. With a finite 
key size, the rate is fluctuating around the corresponding 
probability. Now the question can be stated as a ran- 
dom sampling problem: given the bit error rate in the 
X-basis (ebx), the sample size (n x ), and the population 
size (n x + n z ), upper bound the phase error rate in the 
Z-basis (e pz ), with a probability I — Pg x , 

P ex = Pi{e pz >e bx + 9 x }, (15) 

where 9 X represents the deviation of the phase error rate 
from the tested value (the bit error rate in another basis) 
due to the finite-size effect. Here n x and n z are the num- 
ber of sifted key bits in the X- and Z-basis, respectively. 
The failure probability Pg x will be related to the failure 
probability of the phase error rate estimation step (see 
Eq. ([21])). 

A. Random sampling 

Define two random variables: k = eb x n x and m = 
Sp Z n z +eb x n x . The number of bit errors in the A"-basis, k, 
can be accurately (with a high probability) counted after 
the error verification procedure. Note that m denotes the 
number of bit errors if Bob measures all n x + n z qubits 
in the A"-basis. In the squashing model [H, [H, [65| , Eve 
prepares the qubit received by Bob. Hence, one can as- 
sume Eve chooses a distribution of m, Pr{m}, before 
Bob's detection. 

In order to link the probability, Pg x , to the measure- 
ment results, n x , n z and k (or e^), we go back to the 
original definition of the security parameter in QKD. In 
the security analysis of QKD, Pg x denotes the probabil- 
ity that Eve sets up a distribution Pr{m} (by preparing 
qubits) and then Bob obtains k bit errors in the X-basis. 
Therefore, the mathematical definition for Pg x is 

Pe x = Pr{e pz > e bx + x ,e bx } 

= Pr{m > e hx (n x + n z ) + 9 x n z ,k} 

£ Pr{m,fc} (lg) 

= Pr{fc|m}Pr{ra}. 
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Bob chooses to measure the X-basis randomly (without 
replacement, of course), and thus Pr{fc|m} is given by a 
hypergeometric function, 



Pr{fc|m} 



fm\ /Tix+riz—mS 



(17) 



It is not hard to prove that Eq. (|T7|) is a strict decreas- 
ing function of m when m > (n x + n z )e bx . Thus, from 
Eq. IH]), 



Pbx < Pr{fc|m = e bx (n x + n z ) + 9 x n z } 

V n x + n z _ 2 -(n x +n z )? x (e x ) ( 18 ) 



< 



\Je hx (\ - e bx )n x n z 



where the first equality holds when Eve sets the prob- 
ability distribution to be a delta function Pr{m} = 
<5m,e ( , x (n !C +n,)+0 x nz • The derivation of the second inequal- 
ity is presented in Appendix [B] Note that all the vari- 
ables in Eq. (fl8|) can be measured in practice. The func- 
tion £ x (0) is given by 



£ X {6 X ) =H(e bx + 9 X - q x 9 x ) - q x H(e bx ) 
- (1 - q x )H(e bx + 9 X ) 



(19) 



where q x = n x j (n x + n z ) is the bias ratio. 

A similar formula for the failure probability of phase 
error rate estimation in the X-basis, Pg Zl can also be 
derived, 



Pa, < 



\/e bz (l - e bz )n x 7 



_2-(n x +n z ){j(6> 2 



(20) 



with £,(0.) is defined by ( Z (6 Z ) = H(e bz + 9 Z - q z 9 z ) - 
q z H(e bz ) - (1 - q z )H(e bz + 9 Z ) and q z = n z /(n x + n z ). 
Combining the failure probabilities for the X-basis and 
Z-basis, the total failure probability of phase error rate 
estimation, e p u, is then given by 



< 



Pf) 



(21) 



In case e^ x — (or e^ z — 0), one can replace it by 
= 1 (or n z e bz = 1) to get around the singularity as 
shown in Appendix [Bl One can see that £ X (9 X ) is positive 
when 9 X > and < e bx , e bx + 9 X < 1, due to concavity 
of the binary entropy function H(x). 



When q x — 1/2, i.e., n x = n Zl and 9 X is small, the failure 
probability is given by 



Pe, < 



1 



2 v /2ne bx (l - e bx ) 



e Hi-eto)e hx _ (23) 



Except for the factor 1/ 2^2ne bx (l — e bx ) , this is what 
is used in the literature, such as Refs. [35[- In practice, 
normally we have 2^2ne bx (l — e bx ) > 1, so the bound 
given by Eq. (|23p is tighter than what is used in the 
literature. 



X. PRIVACY AMPLIFICATION 

In view of the EDP picture, we regard privacy am- 
plification as the result of phase error correction. In 
the following, we focus on using two-universal hashing 
to perform phase error correction and determine the cor- 
responding failure probability. 



A. Two-universal hashing 

The family of all Toeplitz matrices {M} of size I x m 
has 2 l+m ~ 1 elements and satisfies the following property: 



Pr{Ma; = My} 



for all x =/= y, 



(24) 



where it is assumed that each matrix is chosen with equal 
probability. This can be proved by slightly adapting the 
proof of Claim 7 of Ref. (66|. Note that the family of 
hash functions performed with Toeplitz matrices is one 
specific case of a more general class known as the two- 
universal families of hash functions. A family of hash 
functions mapping S to T is called two-universal [59| if 



Pr{/(x) = f(y)} < 



for all x ^ y, 



(25) 



where f(x) is a hash function chosen in the family of hash 
functions and in our case f(x) — Mi. Two-universal 
families of hash functions have many useful properties 
and we will rely on some of the them in this paper. 



B. Large data size approximation 



B. Error correction 



In the limit of large n x and n z , 9 X can be chosen to be 
small. Then we can use Taylor expansion for Eq. (|19p . 

£ X {6 X ) =H(e bx + 9 X - q x 9 x ) - q x H(e bx ) 
- (1 - q x )H(e bx + 9) 

= -\(l-q x )q x H"{e bx )9l + 0(9l) (22) 



In 2 (1 - q x )q x 2 
2 (1 - e bx )e bx 



0(6 3 a 



Suppose Alice holds a bit string, a, and Bob a noisy 
version of it, b. The difference between the two strings 
e = a © b is the error pattern. Let S be the set of all 
possible error patterns. Alice and Bob intend to use a 
family of two-universal linear hash functions to correct 
errors in Bob's string with respect to Alice's. A hash 
function /(•) is selected from the family and Alice and 
Bob each computes the hash value of their bit strings with 
the hash function. Alice sends Bob her hash value, to 
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which Bob adds his hash value to arrive at the hash value 
of the error pattern /(e) = /(a) /(b). Note that this is 
valid due to the linearity of the hash functions. Using this 
hash value, Bob can identify the error pattern and thus 
correct the errors in his string. Suppose that there are \T\ 
possible outputs for this family of hash functions. Using 
the definition of a two-universal family given in Eq. (|25[) . 
we can bound the probability of incorrectly identifying 
the error pattern as 



Pr< |J /(e) = /(e') 

e'eS\e 



< 



which follows from applying the union bound to Eq. (|25j) . 
Thus, Bob's error-corrected string matches Alice's with 
probability at least 1 - |5|/|T|. 

Although this hashing-based error correcting proce- 
dure may not be as practical and efficient as conventional 
ones, it is useful for phase error correction in security 
proofs [1, |H, This is because for security purpose 
we only need to show that the phase error pattern is 
identified without actually correcting the error Q, and 
we only need a bound on the probability of successfully 
identifying the error pattern. 



C. Privacy amplification and phase error correction 

Suppose we perform privacy amplification using a set 
of I x m Toeplitz matrices, a member of which can be 
selected with l+m— 1 random bits. Here, I is the final key 
length and m is the sifted key length. For each matrix M 
in the privacy amplification set, we associate an (m—l) x 
m matrix M that is orthogonal to M . The collection of 
all these matrices {Af- 1 } forms the set of hash functions 
for phase error correction. We would like to find out 
whether this set {M^} has the property of Eq. (f2l>)) . If 
it does, we can determine the successful probability of 
phase error correction from Eq. (|26p . 

We remark that it does not matter whether the matri- 
ces of the set {Af- 1 } have the Toeplitz form or not since 
we do not need to generate them but only need to make 
sure that there exists such a set with a certain phase error 
correcting capability. On the other hand, we do impose 
the Toeplitz form on the privacy amplification set {Af } 
since we actually need to generate this set. 

Indeed, it can be shown (see, e.g., Theorem 1 of 
Ref. (6?]]) that when M is chosen from a set of random 
Toeplitz matrices, the corresponding matrices M x also 
form a two-universal set, i.e., 



Pt{M ± x = M ± y} < 



1 



)m — l 



for all x =/= y. 



Thus, according to the discussion in Sec. IX Bl we can 
use the set {Af^} to identify the phase error pattern 
and perform the correction. In essence, when (i) there 
are \S\ number of possible phase error patterns, (ii) the 



sifted key length is m, and (iii) the final key length is Z, 
the failure probability of phase error correction is upper 
bounded by 



\S\2 



-{m-l) 



(27) 



(Note that the sifted key length here will be equal to 
m = n x + n z when it is used in the next subsection.) 

For BB84, the bit error rates for the Z bits and X bits 
are exactly determined from the error correction proce- 
dures, up to a certain probability given by the verification 
step (cf. Eq. (fTl)) ). Focusing on the Z bits, we can esti- 
mate its phase error rate e pz from the actual bit error rate 
of the X bits ebx using the random sampling argument 
of Sec. IIXI Accordingly, the lower and upper bounds on 
the number of phase errors on the Z bits are, except with 
probability P# x (which is bounded in Eq. ([18])), 



< e pz n z < (ebx + x )n z 



(28) 



(Note that the second inequality is a strictly less than 
due to the definition of Pg x in Eq. (JT5J) .) Therefore, with 
probability at least 1 — Pe^ , the number of possible phase 
error patterns in the Z bits is 



k=0 



< 



II z 



(ebx +0 x )n- 



(29) 
(30) 



where the first inequality holds for ebx + @x < 1/3 (see 
Appendix JC] for proof of the first inequality and see, e.g., 
Refs. [6l,[63| for proof of the second inequality). We can 
similarly obtain the bound for the number of possible 
phase error patterns in the X bits \S X \. Combining the 
number of patterns in the Z and X bits, we have \S\ = 
\S Z \\S X \ inEq. (27}. 



D. Key length 

Alice and Bob determine the size of the matrix, I x 
(n x + n z ), used for hashing. Here, I is the key length after 
the privacy amplification. Alice generates an (n x + n z + 
I — l)-bit random bit string and sends it to Bob through 
an authenticated channel. Alice and Bob use this random 
bit string to generate a Toeplitz matrix. The final key 
(with a size of I) will be the product of this matrix (with 
a size of I x (n x + n z )) and the key string (with a size of 
n x +n z ) after passing through the error verification. The 
overall failure probability of the privacy amplification is 
the sum of that for authentication (Eq. ((9])) and that for 
phase error correction (Eq. (f2"T)) ): 



£pa (t^x 



+ 1-1)2 



(31) 



10 



where k pa is the secure-key cost for the authentication 
and t oe is related to Eq. ([77]) by 2~ toe — e pc . By rear- 
ranging Eqs. (|27|) and ([30|) . the final key length is 

l = n x [l-H(e bz + 9 z )] 

+ n z [l-H(e bx + 6 x )]-t oe . 

The first term in Eq. (|3"Tj) gives the failure probability of 
the authentication for the (n x + n z + I — l)-bit random 
bit string transmission. The second term in Eq. ([3"T]) 
gives the failure probability of privacy amplification using 
the Toeplitz matrix. In the equivalent EDP used in the 
security proof d, H3] , the second term in Eq. (f3"Tj) gives 
the failure probability of the phase error correction. 

XI. OPTIMIZATION 

Alice and Bob calibrate the QKD system to get an 
estimate of the transmittance 77, the error rates e bx and 
e bz . Through some rough calculation of the target length 
of the final key, they decide the acceptable confidence 
interval 1 — e and fix the length of the experiment, N, 
which denotes the number pulses sent by Alice. Then 
roughly, the length of the raw key is n — Nij. After basis 
sift, Alice and Bob share an n^-bit (n z -bit) key in the X 
(Z)-basis. 

Alice and Bob can optimize their post-processing using 
either the failure probability or the trace distance as the 
security measure, since they are directly related to one 
another as discussed in Sec. IIII Bl Here, we will use the 
failure probability as the security measure for our dis- 
cussion. The failure probability £ is chosen by Alice and 
Bob according to the later practical use of the final key. 
The desired message security level sets an upper bound 
threshold value for e. Thus, the exact value of e is not 
strictly pre-determined. That is, it can slightly deviate 
from the pre-determined threshold value. 

Given n and e (cf. Eq. (|7|)), Alice and Bob need to 
optimize all the parameters for the key generation rate 
given in Eq. ©. The first parameter they want to op- 
timize is the basis bias ratio, q x — n x /(n x + n z ) which 
(roughly) determines the probabilities to choose the X 
and Z bases, p x and p z , by q x « p x j (p x + pi). The bias 
ratio should be determined before quantum transmission 
while all other parameters can be determined right after a 
raw key is obtained. The initial calibration process gives 
Alice and Bob some idea about the basis ratio which they 
will use in the subsequent QKD process. The remaining 
parameters that need to be optimized are as follows: k bs , 
k evi k pai s bsi &evj £ph> £pa and t oe . Alice and Bob 
need to balance the failure probabilities from each step 
(cf. Eq. 0) and the secure-key cost (cf. Eq. ©). The 
optimization problem becomes the following: given the 
total failure probability 

=2n2-^ +1 + (n x + n,)2- fe "' +1 + e ph (33) 
+ (n x +n z + l- l)2- k »« +1 + 2-*", 



maximize the final key length 

NR>1- 2k bs - k ec - k ev - k pa . (34) 

Note that the parameters k bs , k ev , k pa and t oe affect e 
and the final key rate in similar ways. Also, error correc- 
tion and phase error rate estimation mainly depend on 
the bias ratio. Thus, Alice and Bob can group the secure 
key costs and failure probabilities into two parts by defin- 
ing £ 3 = 2e bs + e ev + e pa and k 3 = 2k bs + k ev + k pa + t oe 
(see Eqs. ([32]), ©, and 0). The final secure key length 
can be rewritten as 

NR >n x [l- f(e bx )H(e bx ) - H(e bz + 6 Z )\ 

+ n z [l - f(e bz )H(e bz ) - H(e bx + 9 X )] - k 3 . 

(35) 

We remark that if the contribution from one basis is neg- 
ative in Eq. ([35]) , Alice and Bob should use the detections 
from this basis for parameter estimation only, but not for 
the key. 

We consider the subproblem: given the failure proba- 
bility 

£3 <2£f,s + S ev + E pa 

=2n2- kb ° +1 + {n x + n 2 )2" fc " +1 (36) 
+ (n x +n z + I - l)2- fe ^ +1 + 2- l °% 

minimize the secret-key cost 

fc 3 > 2k bs + k ev + k pa . (37) 

With the inequality of arithmetic and geometric means, 
one can show that the optimized secure-key cost for each 
step is given by 

fe 3 4 1 , 

toe = — - ~ - ~ fog 2 A 

5 5 5 

k bs =t oe + l + log 2 n (38) 

kev = t oe + l + log 2 (n x + n z ) 

k pa = t oe + l + log 2 (n x + n z + I - 1), 

where A = n 2 (n x +n z )(n x +n z +l—l). The corresponding 
failure probability is 

e 3 = dA 1 /^-^ 4 ^ 5 . (39) 

From Eq. (|3"9"|) . we have 

fc 3 = -5 log 2 £ 3 + log 2 A + 4 + 5 log 2 5. (40) 

Note that n 4 /4 < A < 2n 4 and also e = £3 + e p h- Here if 
Alice and Bob allow e to have a small deviation from the 
pre-determined value, they can put a soft lower bound 
for £3 in the optimization. The exact value of the soft 
lower bound is not really important here as long as it 
is within the tolerable fluctuation range of e. Here, we 
simply choose the tolerable deviation to be within 1%, 
which implies that 10~ 2 £ < £3 < e. Thus, 

-51og 2 £+41og 2 n + 2 + 51og 2 5 < fc 3 

< -5 log 2 £ + 4 log 2 n + 15 + 15 log 2 5. 
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This is true for all 9 X , 9 Z and q x . Note that the difference 
between the lower bound and the soft upper bound of k 3 
is less than 37 bits. When the final key length is much 
larger than 37 bits, Alice and Bob can set 

fc 3 = -51og 2 £ + 41og 2 n + 50 (42) 

and the failure probability £3 will satisfy £3 < 1CP 2 £ since 
the right-hand side of Eq. (j42|) is larger than the upper 
bound in Eq. (|iPj). 

Since Alice and Bob will recalculate the failure prob- 
ability in the end and allow the final e to have a small 
deviation from the predefined value, they can safely use 
£ph = £ in the optimization of the basis bias. Thus, the 
simplified optimization problem only has three parame- 
ters to be optimized: q x , 9 X and 9 Z , given e p h — £—£3 ~ £■ 

In summary, the simplified optimization procedure for 
a target failure probability e is as follows: 

1. Compute fc 3 using Eq. (|42|) : 

2. Maximize the key rate in Eq. (|35|) over q x , 9 X , and 
9 Z subject to e p h = £■ Here, e p h is related to the 
three optimization variables by Eqs. (fTS)) , (12TJ)) , and 
(ED; 

3. After the optimization, they can recalculate the fi- 
nal failure probability e — £3 + e p h, where £3 is 
given in Eq. (|39j) . 

As discussed above, since one can set £3 < 10~ 2 £ (when 
the key length is much larger than 37 bits), the failure 
probabilities for basis sift, error verification, and privacy 
amplification are relatively small, and the failure proba- 
bility for random sampling is the major contribution to 
the total failure probability. 

Observation. The main effect of the finite key analysis 
for the QKD post-processing stems from the phase error 
rate estimation. Inefficiencies due to authentication, er- 
ror verification, and privacy amplification are relatively 
insignificant. 

XII. SIMULATIONS 

Now let us consider an example of the post-processing 
in the simple case of symmetric errors in the two bases. 

Suppose N = 10 10 , 77 = 1(T 3 , (then n = Nr] = 10 7 ), 
ebx = ebz = 4% and e = 10~ 7 . It is not hard to see that 
the final key length is much larger than 37 bits. Thus, 
the simplified optimization is used. 

First, the secure-key cost ^3 = 543 bit, according to 
Eq. 03'- 

Second, given n = 10 7 , e bx = e bz = 4% and £ = 10~ 7 , 
we optimize the parameters: 9 X , 9 Z and q x . Through a 
numerical program, we get 9 X = 1.07%, 9 Z = 0.84% and 
q x = 99.8% (or p x = 96.0%). Note that, in this case, the 
X and Z bases are interchangeable due to symmetry. 

Finally, we can compute the key length and the cor- 
responding security parameter using our post-processing 
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FIG. 2: Lower bound for the key rate as a function of the raw 
key length; parameters used: etx = etz = 4% and the error 
correction efficiency is 100%. The three curves correspond to 
three different values of failure probability e. 

procedure and compare with the key length using asymp- 
totic assumptions. The final key length using asymptotic 
assumptions is 

^asymp = n[l - h 2 (e bx ) - h 2 (e bz )} 1 (43) 

where we used the fact that, asymptotically, the phase 
error rate in one basis is the same as the bit error rate 
in the other basis and the use of efficient BB84 leads to 
always matching basis between Alice and Bob. The key 
length with asymptotic analysis is 5.15 Mb, and the one 
with the post-processing procedure is 4.41 Mb and its 
failure probability is e — 1.0073 x 10~ 7 (roughly 1 + 2~ 7 
times the predefined value of 10~ 7 ). Furthermore, we can 
get the trace-distance security parameter using Lemma Q] 
to conclude that this 4.41 Mb key is composable and is 
(4.4884 x 10 -4 ) secure in accordance with security Def- 
inition [1] Here, for illustrative purposes, the key length 
using the post-processing procedure is calculated with 
the assumption that n x = np x and n z = nil — p x ) 2 . 

In the simulation, we assume the error correction effi- 
ciency is 100% (Shannon limit). Thus, the difference be- 
tween the "asymptotic-key" length and the "finite-key" 
length, 0.74 Mb, comes from the finite statistical analysis. 
The cost (and the security parameter) due to the finite 
key analysis mainly comes from the phase error rate esti- 
mation. Note that all the remaining cost is only £3 = 543 
bit and £3 = 7.3 x 10~ 10 . This point can be clearly seen 
by comparing Eqs. (|2"Tj) and (|39|) in the case of large n. 

The exponent coefficient in Eq. (f2"T|) is — 4 ( 1 _f 6 ) eb > while 

in Eq. (|39j) it is — and also a small change in 9 affects 
the key rate more than that in k^/n does. 

Figure [2] shows the lower bound for the key rate as 
a function of the raw key length. Note that since we 
use our simplified optimization method, the final security 
parameter e for the failure probability deviates slightly 
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FIG. 3: Minimum raw key length to yield a positive key length 
as a function of e; parameters used: et, x = et z = 4% and the 
error correction efficiency is 100%. 



FIG. 5: Plot of the optimal bias ratio vs. the raw key length; 
parameters used: et x = &hz = 4%, target failure probability 
e — 10~ 7 , and the error correction efficiency is 100%. 
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FIG. 4: Effect of the bias ratio on the final key length; pa- 
rameters used: ebx = e-bz = 4%, target failure probability 
e = 10 -7 , the raw key length is 10 6 , and the error correction 
efficiency is 100%. 



from the predefined value. Calculations show that this 
difference is less than 1% of the predefined values over 
the entire plotting range for all three curves. 

Figure [3] shows the minimum raw key length needed to 
yield a positive key length as a function of the predefined 
security parameter e. In typical applications, a rough se- 
curity level may be required for a secret key which is to 
be generated by QKD. Thus, this figure gives the min- 
imum number of signals needed to be detected in order 
to achieve such a security level. 

Figure [4] illustrates the effect of the bias ratio on the 
final key length. It can be seen that when the optimal 
bias ratio is used, the final key length increases by over 
50% compared to the case when the bias ratio of 0.5 



is used. Thus, the bias ratio has a big effect on the key 
generation performance. Figure [5] shows the optimal bias 
ratio versus the raw key length. The optimal bias ratio 
leads to the largest final key length. It can be seen that 
as the raw key length approaches infinity, the optimal 
bias ratio tends to one. This makes sense since in the 
asymptotic case, it is more efficient for Alice and Bob to 
use one basis with a high probability for key generation 
in order to avoid wasteful basis mismatch, and to use the 
other basis only for phase error estimation; this is the 
idea of the efficient BB84 protocol [4l| . The optimal bias 
ratio drops to 0.5 as the raw key length approaches the 
minimal for positive key generation. 



XIII. CONCLUDING REMARKS 

In this paper, we propose a complete post-processing 
procedure for transforming measurement outcomes in a 
QKD experiment to a final secret key, which we quantify 
with a security parameter, namely the failure probability 
of the post-processing procedure. This failure probability 
is directly connected to the composability security defini- 
tion (cf. Lemma [1]). Our post-processing procedure con- 
tains many elements including authentication, the choice 
of the basis bias ratio, error correction and verification, 
phase error rate estimation, and privacy amplification. 
Our procedure results from integrating all these elements 
with ideas from security proofs. Since the underlying se- 
curity proofs [1, Hi| |4?| are secure against the most 
general attacks, our post-processing procedure also in- 
herits this important property. Based on our analysis, 
the main contribution to the finite-size effect comes from 
the inefficiency of phase error rate estimation, which is a 
consequence of the random sampling argument for infer- 
ring unobserved quantities from observed ones. Further 
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remarks and future directions are listed as follows: 

1. In the privacy amplification step, Alice and Bob 
need a common matrix to generate the final se- 
cure key. The current way to construct the ma- 
trix is by Alice sending a random bit string to Bob, 
which requires authenticated classical communica- 
tion. An alternative way is by each of them gen- 
erating a matrix with a pre-shared secret key. Of 
course, the amount of pre-shared secret key bits re- 
quired must be small compared to the generated 
key length. Also, the failure probability is related 
to the amount of pre-shared bits consumed. We 
leave this investigation for future research. The 
main advantage of the second method is that no 
classical communication is needed for the privacy 
amplification part. In this case, the error verifi- 
cation step can be done either before or after the 
privacy amplification. 

2. In the security proof, the imperfection of X- and 
Z-basis measurements and efficiency mismatch are 
not considered. It is interesting to consider the de- 
tector efficiency mismatch with the finite key anal- 
ysis |Z3. 

3. As noted in Ref. [2(j, the finite- key analysis for the 
decoy-state QKD is a hard problem. In the decoy- 
state QKD, the fluctuation comes from not only 
statistics but also hardware imperfections. The 
question of interest is where the main contribution 
of the fluctuation comes from and how to quantify 
these fluctuations. Since QKD systems with coher- 
ent states are most widely used in experiments, in- 
vestigating the finite key effect in decoy-state QKD 
is an important step towards a QKD standard. 

4. In order to fairly compare our finite-key analysis 
to others, such as Scarani and Renner [38[ and Cai 
and Scarani one has to make sure the post- 
processing elements of different post-processing 
procedures carry similar capacities. For example, 
there are different ways to treat the basis bias ratio, 
authentication, and random sampling. Therefore, a 
clear objective must be defined first before making 
a meaningful comparison. We remark that compar- 
ing the performance of various post-processing pro- 
cedures as a whole and comparing only the underly- 
ing security proofs (which are just one element in a 
post-processing procedure) are two different goals. 
As we have shown in this paper, the main contri- 
bution to the finite-size effect comes from random 
sampling in the parameter estimation step. Thus, 
it may be more interesting in practice to compare 
different random sampling arguments. 

5. Our analysis treats the X-basis and Z-basis sepa- 
rately, especially when we estimate the phase error 
rates using the random sampling argument. On the 
other hand, one may mix the measurement data of 



different bases before any analysis. Doing so makes 
it easy to perform a similar finite-key analysis for 
other protocols such as the SARG04 protocol I71J . 
In this case, we can use the Azuma's inequality [72j 
in place of the random sampling argument to esti- 
mate the phase error rate. This is discussed in more 
detail in Appendix [Dl 

6. In QKD experiments, error correction is often per- 
formed in blocks (say, 1 kbit) and privacy amplifi- 
cation is performed on all the blocks together. In 
some error correction scheme, the failure probabil- 
ity for small blocks is not negligible. That is, af- 
ter the error correction, some blocks may still have 
errors, discarding these blocks may have security 
implication and thus care is required It is an 
interesting future topic to give a strict security ar- 
gument on this issue. 

7. Although our analysis uses particular procedures 
for the steps (e.g., authentication, error correction), 
our analysis is generic in the sense that each specific 
procedure may be substituted by another with the 
same functionality. The new secret-key cost and 
failure probability will then be used in the analysis 
of the generation rate and failure probability of the 
final key. 
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APPENDIX A: PROOF OF EQ. (g]) 

First, given that (0| m |pA|0| m ) = a, the purification 
|\^ae) of pa is of the general form 

1*^) = v^loDjo^ + VT^\^) AE (Al) 
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where 1*^)^ has unit norm and ^ (0® m | \&- L ) 
Thus, the fidelity in question is 



AE 



F(p AEl \0T) A (°x m \ 



Pe) 



TrJ\V AE )(* AE \ [\0T) a (0T\®Pe] \*ae)(*ae\ 



--Try/\V ae) [a{Q\p E \Q)) Wae\ 

V"(o|pb|o) 



(A2) 



1 /2 

where p A E = \^ae}(^ae\ = P AE - Since p E = Tr A (p AE ) 
and 



>\ PE \0)=J2\(*AE\[\i) A \0) 1 



(A3) 



where the summation is over all vectors of a basis in 
system A, by considering a basis having |0® m }^ as its 
basis state, we get (0|/9e|0) > a. Substituting this into 
Eq. l|A"2|) . we get Eq. Q. 



APPENDIX B: EVALUATION OF 
HYPERGEOMETRIC FUNCTION 



Then, Eq. (|B1|) can be expressed as 

(n\ /N — n\ 
\k) \m-k) 



Po < 



{ N\ 



nl(N - n)\(N - m)!m! 



kl(n - k)\{m - k)\(N -n-m + k)\N\ 

1 V^y/N - ny/N — m^/rn 

V2n \fk\J n — k\J m — k\J N — n — m + k\^N 
n n (N - n) N ~ n (N ~ m) N -" l m m 
k k (n - k) n - k (m - k) m ~ k 
1 

' (N-n-m + k) N ~ n ~ m + k N N 

• exp(A„ + \n-h + ^N-m + A m — Aa_,— 

An-fc — Kn-k ~ AjV -n-rn+k — Ajv)- 

First, we can prove that 



An+Ajv-n + An-jh + A m — Afc — 

A n -fe — A m _fe — XN-n-m+k ~ Ajv 



< 



(B4) 



(B5) 



with the facts ofm>fc>l,n — k > 1 and Eq. (|B3|) . Re- 
mark: though the left-hand side of Eq. (|B5[) is negative, 
it is close to in the order of 0(l/12fc). 



Second, we know that l/y/x(l — x) is a decreasing 
function for < x < 1/2. Then we can easily see that 



In this appendix, we will evaluate the hypergeometric 
function 



Pg < Pi{k\m,n,N} = 



( m\ { N — m 
UJ_l n-k . 

o 



(Bl) 



with k = e^Rj,, N = n x + n z , n — n x and m = e bx (n x + 
n z ) + 9n z . Here, we relabel the function for simplicity. 
Strictly speaking, 9 is a discrete variable with a minimum 
quantum of l/n Zl which will keep m to be an integer. 

In the following discussion, we assume the integers 
N > m > k > 1 and N > n > k. The only ex- 
ception that could (though highly unlikely) happen in 
the realistic case is k = 0. In this case, for a given m, 
Po(k = 0) < Pe(k = 1). Now that we only care about 
the upper bound of the probability, we can always safely 
replace k = with k = 1 in the calculation. 

We simplify the hypergeometric function by the Stir- 
ling formula [7J| 



7! = 



'2im ( - 
- e 



(B2) 



where 



1 ^ . 1 
12n + l ™ 12n' 



(B3) 



y/n\/N - nyfN - m 



<- 



\fk\/ n — k\J m — k\/ N — n — m + k^/N 
/N 1 



(B6) 



y/n(N - n) y/e bx (l - e bx ) 

1 1 
VN ^Jq x {\ - q x )e bx {l - e bx ) 

with the facts of e bx = k/n and e pz = (m — k)/(N — n) > 
e bx . Remark: when e pz — e bx , the equality holds. From 
this point of view, the bound is tight. 

Third, the remaining term of the failure probability 
can be expressed by 



n n (N -n)"- n (N -to) 



\N — m™m 



k k (n - k) n - k (m - k) m - k (N -n-m+ k) N - n - m + k N N 

(B7) 

where we use the definition of the binary entropy function 
H(x). The exponent coefficient is given by 



^(6») =H(e bx + 9- q x 9) - q x H{e bx ) 
- (1 - q x )H(e bx + 9) 



(B8) 



with q x = n/N and (rn — k)/(N — n) = e bx + 9. Due to 
the concavity of H(x), £ x (6) is negative for 9 > and 
< q x < 1. 
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Therefore, by combining Eqs. (|B4|) . (|B5|) . (jB6j) and 
B7I) , the failure probability of Eq. (fT5|) is given by 



Pe < 



1 



2-^.(9) (Bg) 



where ^(6) is given by Eq. (|B8[) . Note that £a;(0) is in- 
dependent of key size N given the error rates and bias 
ratio. Now we can see that the failure probability de- 
creases (actually, slightly faster than) exponentially with 
N. 



APPENDIX C: PROOF OF EQ. ((29 

We prove Eq. (T25|) by the following claim. 
Claim 1. 

771—1 



E 



k=0 

when m < n/3. 
Proof. First notice that 



n\ in 

k) < L 



\k-u 



k i 

n-k + l < 2 



(CI) 



(C2) 



error rates for them. This is possible in BB84, since the 
phase errors in one basis are the bit errors in the other 
basis. And in this case, a random sampling argument 
suffices to establish some confidence on the unmeasured 
phase error rate in one basis from the measured bit error 
rate in the other basis. On the other hand, one may mix 
all the measurement data in the different bases together 
before applying any further analysis. This can be done in 
BB84. For other protocols, this mixing actually leads to a 
simpler analysis and thus is favorable. Here, we describe 
how to estimate the phase error rate for the mixed-basis 
case. When the measurements are mixed, protocols can 
usually be characterized with a relation between the bit 
and phase error probabilities p p — apn, where a > 1 in 
general (e.g., a = 3/2 for SARG04 [H Izl and a = 5/4 
for a three-state protocol [771]). (Note that here the error 
probabilities are the combined values of all bases and thus 
do not carry a basis designation.) Given such a relation 
in probabilities, we want to establish a similar relation 
for the error rates and compute the confidence for it. 
A useful tool to do this is the Azuma's inequality [lH 
(see also Refs. [zl [ll, [73, [ll] for the application of it 
to security proofs), which relates the sum of conditional 
probabilities to the total number of a particular outcome 
in many trials. To start, we relate the probability and 
the rate for the bit error and the phase error separately 
using the Azuma's inequality as follows: 



is true for all k < n/3. Thus, 

m — 1 / \ tci — 1 

£ (!) s E 2 '" 



fe=0 



fc=0 

n 
m 



< 



is true for m < n/3. 



£ 2 * 

fc=0 



(C3) 



□ 



Pr{|p b - e&| > £az} < 2exp(- 
Pt{\p p - e p \ > e Az } < 2exp(- 



-ne 



Az > 



-ne t 



(Dl) 
(D2) 



where pb tP and e^p designate the error probabilities and 
the error rates respectively, £a z represents a failure prob- 
ability, and n is the number of measurements made. Be- 
cause p p = apb, we multiple these two inequalities to get 
the relation between the bit and phase error rates: 



APPENDIX D: ESTIMATION OF PHASE ERROR 
RATE FOR MIXED-BASIS ANALYSIS 



Pr{|e p - ae b \ > (1 + a)e Az } < 4exp(-ne| z ). (D3) 



The analysis in the main part of the paper treats each 
of the two bases separately when estimating the phase 



For BB84, a — 1 and this bound is worse than the ran- 
dom sampling result (cf. Eq. I23[) in typical situations. 
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